Skip to content

Conversation

@phacops
Copy link
Contributor

@phacops phacops commented Oct 30, 2025

We want to start storing array values in EAP, which could have multiple types (arrays of ints, arrays of floats, etc). This would use the new JSON column type in ClickHouse to help us store arrays of various types.

@phacops phacops requested review from a team as code owners October 30, 2025 22:48
@github-actions
Copy link

github-actions bot commented Oct 30, 2025

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration events_analytics_platform : 0050_add_attributes_array_column
Local op: ALTER TABLE eap_items_1_local ADD COLUMN IF NOT EXISTS attributes_array JSON(max_dynamic_paths=128) CODEC (ZSTD(1)) AFTER attributes_float_39;
Distributed op: ALTER TABLE eap_items_1_dist ADD COLUMN IF NOT EXISTS attributes_array JSON(max_dynamic_paths=128) CODEC (ZSTD(1)) AFTER attributes_float_39;
Local op: ALTER TABLE eap_items_1_downsample_8_local ADD COLUMN IF NOT EXISTS attributes_array JSON(max_dynamic_paths=128) CODEC (ZSTD(1)) AFTER attributes_float_39;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist ADD COLUMN IF NOT EXISTS attributes_array JSON(max_dynamic_paths=128) CODEC (ZSTD(1)) AFTER attributes_float_39;
Local op: ALTER TABLE eap_items_1_downsample_64_local ADD COLUMN IF NOT EXISTS attributes_array JSON(max_dynamic_paths=128) CODEC (ZSTD(1)) AFTER attributes_float_39;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist ADD COLUMN IF NOT EXISTS attributes_array JSON(max_dynamic_paths=128) CODEC (ZSTD(1)) AFTER attributes_float_39;
Local op: ALTER TABLE eap_items_1_downsample_512_local ADD COLUMN IF NOT EXISTS attributes_array JSON(max_dynamic_paths=128) CODEC (ZSTD(1)) AFTER attributes_float_39;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist ADD COLUMN IF NOT EXISTS attributes_array JSON(max_dynamic_paths=128) CODEC (ZSTD(1)) AFTER attributes_float_39;
-- end forward migration events_analytics_platform : 0050_add_attributes_array_column




-- backward migration events_analytics_platform : 0050_add_attributes_array_column
Distributed op: ALTER TABLE eap_items_1_dist DROP COLUMN IF EXISTS attributes_array;
Local op: ALTER TABLE eap_items_1_local DROP COLUMN IF EXISTS attributes_array;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist DROP COLUMN IF EXISTS attributes_array;
Local op: ALTER TABLE eap_items_1_downsample_8_local DROP COLUMN IF EXISTS attributes_array;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist DROP COLUMN IF EXISTS attributes_array;
Local op: ALTER TABLE eap_items_1_downsample_64_local DROP COLUMN IF EXISTS attributes_array;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist DROP COLUMN IF EXISTS attributes_array;
Local op: ALTER TABLE eap_items_1_downsample_512_local DROP COLUMN IF EXISTS attributes_array;
-- end backward migration events_analytics_platform : 0050_add_attributes_array_column

@codecov
Copy link

codecov bot commented Oct 30, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
240 1 239 6
View the top 1 failed test(s) by shortest run time
tests.clickhouse.test_native::test_concurrency_limit
Stack Traces | 0.012s run time
Traceback (most recent call last):
  File "/.venv/lib/python3.11........./site-packages/_pytest/runner.py", line 341, in from_call
    result: TResult | None = func()
                             ^^^^^^
  File "/.venv/lib/python3.11........./site-packages/_pytest/runner.py", line 242, in <lambda>
    lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11....../site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11....../site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 182, in _multicall
    return outcome.get_result()
           ^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11.../site-packages/pluggy/_result.py", line 100, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11....../site-packages/_pytest/threadexception.py", line 92, in pytest_runtest_call
    yield from thread_exception_runtest_hook()
  File "/.venv/lib/python3.11....../site-packages/_pytest/threadexception.py", line 68, in thread_exception_runtest_hook
    yield
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11....../site-packages/_pytest/unraisableexception.py", line 95, in pytest_runtest_call
    yield from unraisable_exception_runtest_hook()
  File "/.venv/lib/python3.11....../site-packages/_pytest/unraisableexception.py", line 70, in unraisable_exception_runtest_hook
    yield
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11....../site-packages/_pytest/logging.py", line 846, in pytest_runtest_call
    yield from self._runtest_for(item, "call")
  File "/.venv/lib/python3.11....../site-packages/_pytest/logging.py", line 829, in _runtest_for
    yield
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11.../site-packages/_pytest/capture.py", line 880, in pytest_runtest_call
    return (yield)
            ^^^^^
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11.../site-packages/_pytest/skipping.py", line 257, in pytest_runtest_call
    return (yield)
            ^^^^^
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11........./site-packages/_pytest/runner.py", line 174, in pytest_runtest_call
    item.runtest()
  File "/.venv/lib/python3.11....../site-packages/_pytest/python.py", line 1627, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/.venv/lib/python3.11....../site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11....../site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11....../site-packages/_pytest/python.py", line 159, in pytest_pyfunc_call
    result = testfunction(**testargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../tests/clickhouse/test_native.py", line 82, in test_concurrency_limit
    assert connection.execute.call_count == 2, "Expected two attempts"
AssertionError: Expected two attempts
assert 1 == 2
 +  where 1 = <Mock name='mock.execute' id='139830884398608'>.call_count
 +    where <Mock name='mock.execute' id='139830884398608'> = <Mock id='139830889465744'>.execute

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Column(
"attributes_array",
JSON(
max_dynamic_paths=4096,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this number is way too high. this would create an additional 4096 columns on each table, almost 14000 columns in total. and I'm fairly certain we will reach it. All it would take is 4096 differently named values across our entire customer base.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend putting it quite low. like 32 and have the shared structure take care of the rest

Copy link
Contributor Author

@phacops phacops Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is 1024, ClickHouse is OK having a higher number of columns. I'm not sure we need to keep a very small number like 32.

We want to take advantage of the shared structure but we'll need to upgrade to 25.8 first. At least, we should keep the default value.

@phacops phacops requested a review from volokluev November 5, 2025 16:36
@phacops phacops merged commit 5ca4152 into master Nov 6, 2025
33 checks passed
@phacops phacops deleted the pierre/use-json-column-in-eap-items branch November 6, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants